Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Kubernetes 1.24+ #294

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

skidder
Copy link
Contributor

@skidder skidder commented Aug 20, 2023

This PR updates the operator to use Kubernetes 1.24 APIs, tested on Kubernetes 1.27.

Changes

  1. Updates project dependencies in go.mod to use the K8S 1.24 APIs
  2. Updates the WordCount example application to use Flink 1.16 (previously Flink 1.8, which predates Flink's new memory management configs)
  3. Configures the SubmitJobRequest struct type to consider the fields all as omitempty so they're included in the SubmitJob only if non-empty. This resolves a problem with newer versions of Flink that attempted to start the job from an empty-string savepoint URI, which would consistently fail. These fields are optional, so omitting them should be fine.
  4. Introduces a Zap logging wrapper since the Prometheus logging package is dead.

###Testing
This was developed and tested against a k3d cluster of Kubernetes 1.27 with the following specs:

Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4+k3s1", GitCommit:"36645e7311e9bdbbf2adb79ecd8bd68556bc86f6", GitTreeState:"clean", BuildDate:"2023-07-28T09:46:05Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"linux/arm64"}

Steps

1. Build Operator

scott /Users/scott/go/src/github.com/lyft/flinkk8soperator
→ make
IMAGE_NAME=$REPOSITORY ./boilerplate/lyft/docker_build/docker_build.sh

------------------------------------
           DOCKER BUILD
------------------------------------

[+] Building 28.6s (17/17) FINISHED                                                                                                                  docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                                                                 0.0s
 => => transferring dockerfile: 895B                                                                                                                                 0.0s
 => [internal] load .dockerignore                                                                                                                                    0.0s
 => => transferring context: 2B                                                                                                                                      0.0s
 => [internal] load metadata for docker.io/library/alpine:3.17                                                                                                       1.0s
 => [internal] load metadata for docker.io/library/golang:1.20.2-alpine3.17                                                                                          1.8s
 => [auth] library/alpine:pull token for registry-1.docker.io                                                                                                        0.0s
 => [auth] library/golang:pull token for registry-1.docker.io                                                                                                        0.0s
 => [builder 1/7] FROM docker.io/library/golang:1.20.2-alpine3.17@sha256:87734b78d26a52260f303cf1b40df45b0797f972bd0250e56937c42114bf472c                            0.0s
 => CACHED [stage-1 1/2] FROM docker.io/library/alpine:3.17@sha256:f71a5f071694a785e064f05fed657bf8277f1b2113a8ed70c90ad486d6ee54dc                                  0.0s
 => [internal] load build context                                                                                                                                    0.1s
 => => transferring context: 203.13kB                                                                                                                                0.1s
 => CACHED [builder 2/7] RUN apk add git openssh-client make curl bash                                                                                               0.0s
 => CACHED [builder 3/7] COPY go.mod go.sum /go/src/github.com/lyft/flinkk8soperator/                                                                                0.0s
 => CACHED [builder 4/7] WORKDIR /go/src/github.com/lyft/flinkk8soperator                                                                                            0.0s
 => CACHED [builder 5/7] RUN go mod download                                                                                                                         0.0s
 => [builder 6/7] COPY . /go/src/github.com/lyft/flinkk8soperator/                                                                                                   0.2s
 => [builder 7/7] RUN go mod vendor && make linux_compile                                                                                                           25.8s
 => [stage-1 2/2] COPY --from=builder /artifacts /bin                                                                                                                0.1s
 => exporting to image                                                                                                                                               0.1s
 => => exporting layers                                                                                                                                              0.1s
 => => writing image sha256:01be975d7ac4f16e3dce41a98088a6ea5e7865fd6ab08f40960e1312f8f0f63c                                                                         0.0s
 => => naming to docker.io/library/flinkk8soperator:ed051403f4b52fd3e9c0ee429a2e476b74356e4b                                                                         0.0s

What's Next?
  View summary of image vulnerabilities and recommendations → docker scout quickview
flinkk8soperator:ed051403f4b52fd3e9c0ee429a2e476b74356e4b built locally.

Next, tag the image and push to my local Docker image registry:

scott /Users/scott/go/src/github.com/lyft/flinkk8soperator
→ docker tag flinkk8soperator:ed051403f4b52fd3e9c0ee429a2e476b74356e4b localhost:56230/flinkk8soperator:ed051403f4b52fd3e9c0ee429a2e476b74356e4b

scott /Users/scott/go/src/github.com/lyft/flinkk8soperator
→ docker push localhost:56230/flinkk8soperator:ed051403f4b52fd3e9c0ee429a2e476b74356e4b
The push refers to repository [localhost:56230/flinkk8soperator]
1b411adf446d: Pushed
d2d9d24a8c2a: Layer already exists
ed051403f4b52fd3e9c0ee429a2e476b74356e4b: digest: sha256:3e203d359db91d5ea61b0bbc47b28f2b5bfc79505552b7ed07663f8b8abff94a size: 739

2. Deploy the Operator YAMLs

Update the deploy/flinkk8soperator.yaml file to use the image tag and name of my local image registry:

        image: flink-operator-dev-registry:56230/flinkk8soperator:ed051403f4b52fd3e9c0ee429a2e476b74356e4b

Apply all the YAMLs in deploy using the Quick Start Guide.

3. Build the Word Count app

scott /Users/scott/go/src/github.com/lyft/flinkk8soperator/examples/wordcount
→ docker build . -t flink-wordcount:latest
[+] Building 9.4s (15/15) FINISHED                                                                                                                   docker:desktop-linux
 => [internal] load .dockerignore                                                                                                                                    0.0s
 => => transferring context: 2B                                                                                                                                      0.0s
 => [internal] load build definition from Dockerfile                                                                                                                 0.0s
 => => transferring dockerfile: 362B                                                                                                                                 0.0s
 => [internal] load metadata for docker.io/library/flink:1.16.2-scala_2.12-java11                                                                                    1.0s
 => [internal] load metadata for docker.io/library/maven:3.9.3                                                                                                       1.0s
 => [auth] library/flink:pull token for registry-1.docker.io                                                                                                         0.0s
 => [auth] library/maven:pull token for registry-1.docker.io                                                                                                         0.0s
 => [builder 1/4] FROM docker.io/library/maven:3.9.3@sha256:de70becda2f183567e105530460b6cbedfc014b64f27c2003d36d7fc85bc222f                                         0.0s
 => [internal] load build context                                                                                                                                    0.0s
 => => transferring context: 1.74kB                                                                                                                                  0.0s
 => CACHED [stage-1 1/3] FROM docker.io/library/flink:1.16.2-scala_2.12-java11@sha256:f11b1cd44c964cd644d0ee6b01281f3c5a57f2c7a519939c9529e6cbbd96f615               0.0s
 => CACHED [builder 2/4] COPY src /usr/src/app/src                                                                                                                   0.0s
 => [builder 3/4] COPY pom.xml /usr/src/app                                                                                                                          0.0s
 => [builder 4/4] RUN mvn -f /usr/src/app/pom.xml clean package                                                                                                      8.2s
 => [stage-1 2/3] COPY --from=builder /usr/src/app/target/ /code/target                                                                                              0.0s
 => [stage-1 3/3] RUN ln -s /code/target /opt/flink/flink-web-upload                                                                                                 0.1s
 => exporting to image                                                                                                                                               0.0s
 => => exporting layers                                                                                                                                              0.0s
 => => writing image sha256:b1615553beb1c657957d2825d7302b472961abe93e63ce87fcb8658b5525b18e                                                                         0.0s
 => => naming to docker.io/library/flink-wordcount:latest                                                                                                            0.0s

Tag and push the image to my local registry:

scott /Users/scott/go/src/github.com/lyft/flinkk8soperator/examples/wordcount
→ docker tag flink-wordcount:latest localhost:56230/flink-wordcount:latest

scott /Users/scott/go/src/github.com/lyft/flinkk8soperator/examples/wordcount
→ docker push localhost:56230/flink-wordcount:latest
The push refers to repository [localhost:56230/flink-wordcount]
662da9bda804: Pushed
103047c74d48: Pushed
18c49fee7a8c: Pushed
67376cc00cc7: Pushed
63bd38732135: Pushed
a70c997c3eb4: Pushed
f087e5bef299: Layer already exists
d493a7b98dcf: Layer already exists
4fb17f51b932: Layer already exists
5a04c4a9f49c: Layer already exists
4b97ee001b98: Layer already exists
22236f4b6e33: Layer already exists
1b9698f962dc: Layer already exists
latest: digest: sha256:bf33ff2f8b9e2fb23ee9845744a174a94775b308bc34884c1335295c17c3574f size: 3040

4. Deploy Word Count

Update the examples/wordcount/flink-operator-custom-resource.yaml file to reference the new image with:

  image: flink-operator-dev-registry:56230/flink-wordcount:latest
  imagePullPolicy: Always
scott /Users/scott/go/src/github.com/lyft/flinkk8soperator/examples/wordcount
→ kubectl get FlinkApplication
No resources found in flink-operator namespace.

Apply the YAML:

scott /Users/scott/go/src/github.com/lyft/flinkk8soperator/examples/wordcount
→ kc apply -f flink-operator-custom-resource.yaml
flinkapplication.flink.k8s.io/wordcount-operator-example created

Monitor the Flink Application:

→ kubectl get FlinkApplication -w
NAME                         PHASE             CLUSTER HEALTH   JOB HEALTH   JOB RESTARTS   AGE
wordcount-operator-example   ClusterStarting                                                29s
wordcount-operator-example   Savepointing                                                   49s
wordcount-operator-example   SubmittingJob                                                  49s
wordcount-operator-example   SubmittingJob                                                  49s
wordcount-operator-example   SubmittingJob     Green                                        49s

the job enters a running state. This can happen for really
brief jobs, like the example WordCount job. Also, address
Kubernetes API changes that require retrieval of the status
object to update its status.
@premsantosh
Copy link
Contributor

Thank you so much @skidder for the contribution. The thing is we are also internally working on upgrading the compatibility to higher k8s version. We will look into this PR and see how we can either leverage the work here or test this out internally. cc @bhanuvikasr @sethsaperstein-lyft

@sethsaperstein-lyft
Copy link
Contributor

Thanks for the contribution @skidder. I approved the integration tests and it looks like there are some issues. These integration tests can be run locally as well. Instructions can be found here. Feel free to reach out if you have any issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants